Course Intro

Practical Computing Skills for Omics Data (PLNTPTH 5004)

Jelmer Poelstra

MCIC Wooster, Ohio State University

2025-08-26

Personal introductions

Introductions: Jelmer (instructor)

  • Lead of the CFAES Bioinformatics and Microscopy cores
    • Part of what was until recently called the Molecular & Cellular Imaging Center (MCIC)
    • We are now grouped under CFAES Analytical Resources, core facilities providing services in molecular biology, high-throughput sequencing, bioinformatics, microscopy, and soil analyses.

  • What I work on
    • The majority of my time is spent providing research assistance,
      working with grad students and postdocs on omics data
    • Teaching, such as this course, workshops, Code Club (https://osu-codeclub.github.io)

  • Background in animal evolutionary genomics & speciation

  • In my free time, I enjoy bird watching – locally & all across the world

Introductions: TA / co-instructor

TBA

Introductions: You

  • Name

  • Lab and Department

  • Research interests and/or current research topics

  • Something about you that is not work-related, such as a hobby or fun fact

Course goals and background

The core goals of this course

Learning skills that will enable you to:

  • Do your research more reproducibly and efficiently (e.g. by using code)

  • Work with large-scale “omics” datasets and do applied bioinformatics


To do so, this course will focus primarily on what we may call “fundamental computational skills” rather than on specific applications. For example, you will learn to:

  • Code in the Unix shell, R and Nextflow languages
  • Organize, document, and manage your project data, code, and results
  • Work with a remote supercomputer
  • Write automated analysis pipelines

Course background I: Reproducibility

Two related ideas:

  1. Getting same results with an independent experiment (replicable)

  2. Getting same results given the same data (reproducible)


Our focus is on #2.

Course background I: Reproducibility (cont.)

“The most basic principle for reproducible research is: Do everything via code.”
—Karl Broman


Additionally, also important for reproducibility are:

  • Project organization and documentation (week 3)

  • Sharing your data and code (for code: Git & GitHub, week 4)

  • How you code (covered throughout)


Another motivator: working reproducibly will benefit future you!

Course background II: Efficiency and automation

  • Using code enables you to work more efficiently and automatically —
    particularly useful when having to:

    • Do repetitive tasks

    • Recreate a figure or redo an analysis after adding a sample

    • Redo a project after uncovering a mistake in the first data processing step.

Course background III: Omics data

Omics data is increasingly important in biology, and most notably includes:

  • Genomics
  • Transcriptomics
  • Proteomics
  • Metabolomics

The next lecture will introduce omics data in more detail.


What this course does and does not focus on

  • While we’ll be using some example omics datasets, this course will not comprehensively specific omics data analyses — our focus is more on fundamental computational skills.

  • A highly recommended follow-up course to learn omics data analysis specifics:
    Genome Analytics (HCS 7004) by Jonathan Fresnedo-Ramirez

Course background IV: Applied bioinformatics

Also: computational biology

TBA

Course topics

The Unix shell & shell scripts

The Unix shell (or the “Terminal”) is a command-line interface to computers.

Being able to use the Unix shell is a fundamental skill when working with omics data, for example because many of the specialized analysis software must be run using the shell.


  • You’ll spend a lot of time with the Unix shell, starting next week.
  • You’ll also write shell scripts, and will use an editor called VS Code for this and other purposes.

Bash (shell language)

VS Code

Project organization & documentation

Good project organization & documentation is a necessary starting point for reproducible research.


  • You’ll learn best practices for project organization, file naming, etc.

  • You’ll learn how to manage your data and software

  • To document and report what you are doing, you’ll use Markdown files.


Markdown

Version control with Git and GitHub

Using version control, you can more effectively keep track of project progress, collaborate, share code, revisit earlier versions, and undo.


  • Git is the version control software we will use,
    and GitHub is the website that hosts Git projects (repositories).

  • You’ll also use Git + GitHub to hand in your graded assignments.



High-performance computing with OSC

Thanks to supercomputer resources, you can work with very large datasets at speed — running up to 100s of analyses in parallel, and using much larger amounts of memory and storage space than a personal computer has.


  • We will use OSC throughout the course, and you’ll get a brief intro to it this week
  • In week 5, you’ll learn how to manage data and software at OSC (e.g. with Conda)
  • In week 6, you’ll learn to submit shell scripts as OSC “batch jobs” with Slurm





Automated workflow management

Omics data analyses typically consist of many consecutive steps.

Using a workflow written with a workflow manager, you can run and rerun an entire analysis pipeline with a single command (and much more).


  • You’ll use the workflow language Nextflow to build your pipelines
  • You will also learn how to use comprehensive, best-practice omics data Nextflow pipelines produced by the nf-core initiative


The R language

While the Unix shell, and software that is run through the Unix shell, is best used for the initial (algorithmic) processing steps of omics data, R is probably the most prominent language in the more “downstream” and often statistical analysis and visualization of omics data.

In this course, you will learn the basics of R, how to visualize data in R, and how you can use specialized packages for omics data analyis.



R vs. Python

Python is also commonly used but I believe that altogether, R is a far better choice for this course. On the other, Python is a great follow-up language to learn for those seeking to specialize in bioinformatics.

Using generative AI to help with coding

  • TBA


Course practicalities

Zoom

  • Be muted by default, but feel free to unmute yourself to ask questions any time.

  • Questions can also be asked in the chat.

  • Having your camera turned on as much as possible is appreciated!

  • “Screen real estate” — large/multiple monitors or multiple devices best.

  • Be ready to share your screen.

Participatory live coding

TBA

Websites & Books

  • Info about CarmenCanvas website TBA


  • I am only using slides for this course intro and for the very next lecture on omics data. Other material will be presented via “regular” pages on the GitHub website, as this works better for the interactive live coding we’ll be doing.

Quick tour of the Github website

TBA

Books and other readings

  • Books:
    • Computing Skills for Biologists (“CSB”; Allesina & Wilmes 2019)
    • Bioinformatics Data Skills (“Buffalo”; Buffalo 2015)
  • Papers:
    • TBA
  • At the bottom of a number of lecture pages on the GitHub website are “Bonus” sections, which represent optional additional reading.

Office hours

TBA

Computer requirements

TBA

Homework and grading

What your grade is made up of

You can earn a total of 100 points across 6 assignments and 4 final project checkpoints.

Graded assignments

These are due on Mondays and are worth 10 points each:

  1. Shell basics (due week 3)
  2. Markdown & Git (due week 5)
  3. Shell scripting (due week 6)
  4. OSC batch jobs (due week 8)
  5. Nextflow (due week 11)
  6. R (due week 15)

The first one is submitted through CarmenCanvas, while all others are submitted via GitHub so you can get more practice with that.

Final project

Plan and implement a small computational project, with the following checkpoints:

  • I: Proposal (due week 13 – 5 points)

  • II: Draft (due week 15 – 5 points)

  • III: Oral presentations on Zoom (week 16 – 10 points)

  • IV: Final submission (due Dec 15 – 20 points)



Data sets for the final project

It is ideal if you have/develop your own idea for a data set and analysis — for example, that way you may do something that’s directly useful for your own research.

If not, I can provide you with this.

More information about the final project will follow later in the course.

Using generative AI for graded assignments

TBA

Ungraded homework

  • Weekly readings

  • Weekly exercises — I recommend doing these on Fridays after the week’s session.

  • Miscellaneous small assignments such as surveys and account setup.



Weekly materials & homework

I will try add the materials for each week on the preceding Friday — at the least the week’s overview and readings.

None of this homework had to be handed in.

Weekly recitation on Monday

We will have an optional but highly recommended weekly recitation meeting on Mondays, during which we go over the exercises for the preceding week.


Practice is key!

This course is intended to be highly practical, and if you don’t practice the skills we will focus on by yourself, you will not get much out of it.


Please indicate your availability here: TBA

Rest of this week

  • Introduction to omics data

  • Introduction to the Ohio Supercomputer Center (OSC)

  • Homework:
    • TBA

Questions?





(Back to the site)